This is an attempt at a streamlined and yet complete, relatively a priori/non- snooped model analysis. Note that this version is an overhaul, with some missing components from the older results: see MixedEffects_older.Rmd

Some parts of the analysis have been moved into separate .R files: the overall workflow is

(mmd_procdata is out of date: Enric has processing machinery to extract lat/long of centroid of largest polygon (?) (x/y in the data set) and area …)

source("mmd_utils.R")
source("gamm4_utils.R")

Packages used/versions:

load_all_pkgs()
theme_set(theme_bw())
zmargin <- theme(panel.spacing=grid::unit(0,"lines"))
zmarginx <- theme(panel.spacing.x=grid::unit(0,"lines"))
library('pander')  ## for tables
sapply(pkgList,function(x) format(packageVersion(x)))
##         lme4        gamm4        bbmle        broom  broom.mixed 
##     "1.1.16"      "0.2.5"     "1.0.20"      "0.4.3"      "0.0.1" 
##      lattice    gridExtra      ggplot2       plotly     ggstance 
##    "0.20.35"        "2.3" "2.2.1.9000"      "4.7.1"        "0.3" 
##        dplyr        tidyr       tibble        remef       r2glmm 
##      "0.7.4"      "0.7.2"      "1.4.2" "1.0.6.9000"      "0.1.2"

Note that you now need the development version of lme4, i.e. devtools::install_github("lme4/lme4").

Data from Enric (includes area and lat/long coordinates):

L <- load("ecoreg.RData")

Limit all observations with all predictor variables > 0 and no biome 98/99; set biome as factor. Log and scale/center variables as appropriate. This leaves us with 29 variables and 620 observations.

Model fitting

This can now be done semi-automatically (i.e., fit all combinations of random effects) by using the function fit_all from the mmd_utils.R file (sourced above), e.g. fit_all(response="mbirds_log")

The mmd_fitbatch.R code has already run all random-effects model combinations for all four response variables, both with lme4 and with gamm4. mmd_reduce.R has reduced these to lists with components sum (summaries: AIC, singularity, etc.); coef (coefficients); and pred (fitted, residuals, etc.).

Load data:

load("allfits_sum_lmer.RData")  ## 4 taxa x 27 fits, using lmer
load("allfits_sum_gamm4.RData") ## 4 taxa x 27 fits, using gamm4

Table of models with AIC<10:

aictab1 <- lme4_res$sum %>%
    filter(AIC<8) %>%
    arrange(taxon,AIC) %>%
    mutate(AIC=round(AIC,1),
           model=shorten_modelname(model))
pander(select(aictab1,-best),emphasize.strong.rows=which(aictab1$best))
taxon model AIC singular df
mamph_log b=i/fr=i/b_FR=d 0 FALSE 20
mamph_log b=d/fr=i/b_FR=i 3 TRUE 20
mamph_log b=d/fr=i/b_FR=d 3.9 TRUE 24
mamph_log b=d/fr=d/b_FR=i 5.3 TRUE 24
mamph_log b=i/fr=d/b_FR=d 5.5 TRUE 24
mamph_log b=i/fr=i/b_FR=f 7.2 FALSE 30
mbirds_log b=i/fr=i/b_FR=d 0 FALSE 20
mbirds_log b=i/fr=d/b_FR=d 4.6 TRUE 24
mbirds_log b=i/fr=d/b_FR=f 5.7 TRUE 34
mbirds_log b=i/fr=i/b_FR=f 6.2 TRUE 30
mbirds_log b=d/fr=i/b_FR=d 7.2 TRUE 24
mmamm_log b=i/fr=d/b_FR=f 0 TRUE 34
mmamm_log b=i/fr=i/b_FR=f 3.3 TRUE 30
mmamm_log b=d/fr=d/b_FR=f 6.7 TRUE 38
mmamm_log b=i/fr=f/b_FR=f 7.4 TRUE 44
mmamm_log b=i/fr=d/b_FR=d 7.8 FALSE 24
plants_log b=i/fr=i/b_FR=d 0 FALSE 20
plants_log b=d/fr=i/b_FR=i 3 TRUE 20
plants_log b=d/fr=i/b_FR=d 3.3 TRUE 24
plants_log b=i/fr=i/b_FR=f 4.2 TRUE 30
plants_log b=i/fr=d/b_FR=d 6.3 TRUE 24
plants_log b=d/fr=i/b_FR=f 7.8 TRUE 34

Best (non-singular) models only:

lme4_best_sum <- lme4_res$sum %>%
    filter(best) %>% select(-c(best,singular))
gamm4_best_sum <- gamm4_res$sum %>%
    filter(best) %>% select(-c(best,singular))
all_best_sum <- bind_rows(list(lme4=lme4_best_sum,gamm4=gamm4_best_sum),
                          .id="type")
pander(all_best_sum)
type taxon model AIC df
lme4 plants_log biome=int/flor_realms=int/biome_FR=diag 0 20
lme4 mamph_log biome=int/flor_realms=int/biome_FR=diag 0 20
lme4 mbirds_log biome=int/flor_realms=int/biome_FR=diag 0 20
lme4 mmamm_log biome=int/flor_realms=diag/biome_FR=diag 7.835 24
gamm4 plants_log biome=int/flor_realms=diag/biome_FR=int 9.79 21
gamm4 mamph_log biome=int/flor_realms=int/biome_FR=int 22.63 17
gamm4 mmamm_log biome=int/flor_realms=diag/biome_FR=int 25.77 21
gamm4 mbirds_log biome=int/flor_realms=int/biome_FR=int 36.9 17

Our current strategy is to (1) take the best non-singular model fitted by lme4; (2) find the coefficients of the corresponding gamm4 model. (Below, we also try taking the best non-singular gamm4 model.)

lme4_best_models <- select(lme4_best_sum,c(taxon,model))
gamm4_best_models <- select(gamm4_best_sum,c(taxon,model))
## keep only predictions from best models
gamm4_best_pred <- gamm4_res$pred %>%
    right_join(lme4_best_models,by=c("taxon","model"))

Fitted vs residual for all four taxa:

ggplot(gamm4_best_pred,aes(.fitted,.resid))+
    geom_point()+geom_smooth()+
    facet_wrap(~taxon,nrow=1)+zmargin

A little bit heavy-tailed …

## https://stackoverflow.com/questions/40598011/how-to-customize-hover-information-in-ggplotly-object
ggqq <- ggplot(gamm4_best_pred,aes(sample=.resid,
                                   colour=biome,
                                   flor_realms=flor_realms))+
    stat_qq()+stat_qq_line(aes(group=taxon))+
    facet_wrap(~taxon,nrow=1)+zmargin
## print(ggqq)
ggplotly(ggqq,tooltip=c("biome","flor_realms"))

coefficient plot of all models tried

For example:

gamm4_allcoef <- get_allcoefs(gamm4_res,"plants_log") %>% add_wald_ci
lme4_allcoef <- get_allcoefs(lme4_res,"plants_log") %>% add_wald_ci
gg_allcoef <- ggplot(gamm4_allcoef,aes(estimate,model,colour=singular,shape=singular))+
    ## use point + linerange because some RE are missing std.errors
    geom_point()+
    geom_linerangeh(aes(xmin=lwr,xmax=upr))+
    facet_wrap(~term,scale="free_x")+
    geom_vline(xintercept=0,lty=2)+
    scale_colour_brewer(palette="Set1")+
    zmarginx

All gamm4 coefs for plants:

print(gg_allcoef)

all lme4 coefs for plants:

print(gg_allcoef %+% lme4_allcoef)

to do: why so many singular now … ?

to do: redo profiling/profile plots, comparison of Wald/profile CIs (at least for best models …

Check difference between Wald and likelihood profile confidence intervals. In principle profile CIs are more accurate - if the computations have run reliably … but I would probably be conservative and take the wider of the two.

Pull out coefs from best models for lme4, gamm4 (from lme4-best model), gamm4 (from gamm4-best model)

gamm4_best_coef <- gamm4_res$coef %>%
    right_join(lme4_best_models,by=c("taxon","model")) %>%
    add_wald_ci %>% drop_intercept
gamm4_best2_coef <- gamm4_res$coef %>%
    right_join(gamm4_best_models,by=c("taxon","model")) %>%
    add_wald_ci %>% drop_intercept
lme4_best_coef <- lme4_res$coef %>%
    right_join(lme4_best_models,by=c("taxon","model")) %>%
    add_wald_ci %>% drop_intercept
all_best_coef <- bind_rows(list(lme4=lme4_best_coef,gamm4=gamm4_best_coef,
                                gamm4_2=gamm4_best2_coef),
                           .id="type")

to do: plot all coefficients including std devs (need to combine group with term for random effects)

Exclude random effects and NPP_log (which is large and significant for all taxa). Abuse warning: coloring significant effects. Snooping warning: counting number of sig values for each model type!

There are not a lot of effects other than NPP_log that are consistently large and significant; perhaps main effects of fire for birds and mammals. Amphibians have considerably larger effects (the last three: Feat cv and its interactions).

Plotting functions

Load best non-singular gamm4 fits:

load("bestmodels_gamm4.RData")
names(best_models)
## [1] "plants_log" "mamph_log"  "mbirds_log" "mmamm_log"

plotfun() takes arguments:

ggplot1 <- plotfun(best_models[[1]])
print(ggplot1)

or via ggplotly():

ggplotly(ggplot1)

Partial residuals (not working yet):

fix_NAs <- function(rem,model) {
    if (!is.null(nastuff <- attr(model.frame(model),"na.action"))) {
        return(napredict(nastuff,rem))
    } else return(rem)
}
rem1 <- fix_NAs(remef(best_models[[1]]$mer,ran="all"),best_models[[1]])
if (length(rem1)==nrow(ecoreg)) {
    ## if it worked ...
    ecoreg$rem1 <- rem1
    ## update previous plot:
    plotfun(best_model,respvar="rem1")
}

to do